https://prezi.com/view/zYGal9DWEFulQ9FjeTOM/
#Introduction
One relaxing Friday evening several weeks ago, the three of us sat around a circle table in Frary Dining Hall enjoying a long, pleasant dinner filled with scintillating topics of conversation. As we talked, we eventually began discussing our interests in music, and specifically, which era of music was best. Owen appreciated the ‘80s, Will the 00’s, and Jacob the ‘10s, each for different reasons. Owen claimed the ‘80s was a “fun” and “happy” era of music, partially as a result of its minimal political overtones. Will agreed with Owen’s point, but noted the wide variety of genres in the 2000’s that had not been previously prevalent. Jacob argued that music in the 2010’s has featured the most creativity and experimentation. As any Pomona students would, we determined this Frary argument should be examined under the scope of academic, data-backed research, and our Computational Statistics semester project proved the perfect outlet.
As shown in the charts below, Owen may be correct about the 80’s, especially when compared to the 10’s!
Project Goals and General Methods
Of course, music preferences are subjective, and thus it is impossible to determine the “best” era of music. That said, characterizing a time period’s ideas and sentiments can be done fairly objectively through analyzing that time period’s music, and more specifically, the language and lyrics used in the songs of each era. Song lyrics are a key identifier when it comes to determining song popularity (Berger, 2018), and they provide an opportunity for rich analyses of their associated time period.
Hence, our group’s goal is to analyze (popular) song lyrics in the yearly Billboard Top 100 charts from 1970 to 2019 to determine and visualize word, concept, and sentiment popularity, as well as the dynamics of each. As discussed in the Data section below, the majority of our data is textual. Thus, many of the methods we use in our analyses are branches and applications of Natural Language Processing (NLP). More specifically, we used computational NLP techniques in order to examine word popularity, lyrical sentiment, and word uniqueness, in addition to an even larger array of techniques detailed below across the last five decades.
In order to allocate, clean, wrangle, merge, and visualize the data, we used a variety of different packages in both Python (regular expressions, BeautifulSoup, requests, spacy, etc.) and R (tidytext, tidyr, dplyr, ggplot, wordcloud2, etc.) and heavily relied on the python based Genius API.
Data
We originally had planned to use two datasets. One was from DataWorlds, provided by Sean Miller, and featured 50,000 Billboard top 100 songs, compiled every week from 1960-2019. The other was the Metrolyrics’s 300,000+ song lyrics dataset from kaggle. We wanted to merge these data sets by song, which proved to be a rather daunting task, due to a number of factors. Despite 300,000 songs being a large amount, there was not nearly as much overlap with the 50,000 Billboard songs as we expected. Additionally, there were a number of string-related differences that would’ve been very time consuming to standardize, stemming from differences in handling featured artists (sometimes in the Artist’s column, sometimes in the Song Title column, and with different characters to demonstrate connection such as “feat., ft., x, &, featuring” and even both artists names listed without any formal union.
We opted to search for a second plan after dabbling with regular expressions enough to realize this was not going to be fruitful. Enter Genius API. We discovered through some in-depth research examining precedent work that Genius is very generous in letting amateur data scientists access their API, which enabled us to make thousands and thousands of calls for free without being throttled (for the most part). Using the Python-based API, we were able to iterate through our 50,000 Billboard song data-base, which we split up by decade and scraped song-lyrics from Genius’s website for each song. This was no small undertaking. It took, on average, about 3-5 hours for each decade to scrape. We would run these overnight, and eventually, after several days, we had collected all of our lyrical data. Using some basic cleaning procedures in R, we removed rows without lyrics and began preliminary analyses by generating basic word clouds. We noticed the appearance of several unusual characters (â, €, etc.). Despite toying with regular expressions in both R and Python, we were unable to clean the lyrics to the degree that we wanted.
We decided to ultimately pivot from this original weekly Billboard song dataset, citing the lesser known songs in the database’s compatibility issues with the Genius API. Our third and final plan involved starting from scratch. We decided to only look at yearly (instead of weekly) Billboard Top 100 data from 1970-2019. To do this, we began by using the requests and BeautifulSoup packages in Python to build a scraper that would output a dataframe including, song title, position and year for our range of decades. We pulled this data from billboards.com and billboardtop100of.com. We then iterated through this data-frame, once again using the Genius API, to compile lyrics for songs. However, again we noticed some unusual characters in some of our song lyrics - some of the “lyrics” were not lyrics! If the Genius API could not find the exact song, it would pick, at random, another page on its website and scrape that information instead. (In one case, we pulled an entire episode of Bojack Horseman instead of the intended song because of punctuation in the song title.) We implemented a fix that checked if the title of the webpage was equal to the title of the song in order to fix this problem.
After all this work, our final data-set was complete! A data-frame including Song Title, Artist, Song Position on Billboard, Lyrics, Year, Featured Artists and the link to the page from which we pulled each song. Because our data files were so enormous, we hosted our data locally. Pictured below is what some of our final data frame looked like.
Final Data Frame
Procedure
After completing our dataset, next steps involved pre-processing our lyrics for easier manipulation in R and inputting our natural language processing pipeline. The Python code for the process below can be found in the appendix.
We began using the spacy package in Python on the raw lyrics– this process involved removing all punctuation, and utilizing regular expressions to make our lyrics alphanumeric and lowercase. Once this was done, we were able to use the spacy package’s core NLP cleaning functionalities with some online help from previous natural language processing projects that involved song lyrics to build a function that would take in our alphanumeric/lowercase lyrics and output a string that was the corpus, or main body of words, within each set of lyrics, excluding articles and other “Stop Words,” which were taken from an online dictionary and appended to our our own list of words that we didn’t want to include such as “Verse” and “Chorus.” This corpus was then parsed through in order to create separate strings for verbs, nouns and adverbs (using a processed referred to as lemmatization) and a value for word count within each corpus. These were all appended as columns to the original data set, which was then output as a csv and imported in R. From here, we began our NLP processes, which will be detailed below.
Our analysis included metrics of word frequency (by part of speech and decade), sentiment analysis, lexical diversity and density, bigrams, term frequency - inverse document frequency (TF-IDF) importance analysis, and latent dirichlet allocation (LDA). Our visualization tools included wordclouds, heatmaps, radar charts, bar-graphs, and smooth line plots over time.
Results and Findings [Please be aware that crude language is included in this analysis]
Word Clouds
Word Clouds are designed to visualize the words appearing with the highest frequency. We believe that this visualization best displays the most commonly used words in our song lyrics dataset on a total and per-decade basis and are useful in determining broad changes in lyrical language over time. In order to create this visualization, we first create the ‘Corpus’ column of our data, which removes meaningless words like articles and prepositions, in order to better highlight the more impactful words (nouns, verbs, and adjectives) of songs. We then tokenize (break up by word) our existing song data to only consider each “Corpus” word and the number of times that word appears in our fifty-year time period. We create an initial word cloud reflecting the top ten popular words of all-time.
Then, we separate the data by decade to determine if certain words have popularity peaks in different eras. Consistent across all decades, love is the most commonly used word in our lyrical database. At the turn of the century, we notice there is a rise in profanity and offensive terminology. Otherwise, the word clouds organized by decade show no other major changes in word frequency.
##Top 10 Words and Heat Map
To see whether artists’ popular language change across the decades, we plot the top ten most commonly used words from 1970 to 2019. To build the plot, we examine our complete tokenized word list to determine the ten most-used words over the past fifty years. We then count the instances in which those words were used in each distinct year and plot the results as a line graph below.
This graph provides trends of popular words over time, and we observe that “love” dominates our dataset, although it declines in frequency as we approach the present era. In the late 1980s and early 2000s, we see “baby” and “girl,” respectively, begin to gain more traction, indicating a shift in music to a slightly more diverse set of lyrics. Providing an even better visual of these changes is the heat map below, where the lighter shade indicates higher popularity.
Sentiment Radar Graph By Decade
To gain a deeper understanding of how general musical moods have changed over time, we run a sentiment analysis on our song lyrics over the past five decades. In carrying out this analysis, we note that oftentimes a singular word lacks requisite context to provide a proper analysis of its usage. So, we use a method that aggregates the sentiments of all words in a decade to better discover overall themes.
We use the R package ‘tidytext’ to implement a dictionary using the NRC (National Research Council in Canada) sentiment analysis dataset, which places given words into ten buckets, including joy, fear, disgust, anticipation, anger, trust, surprise, sadness, positive, and negative. Note that the figures below feature only eight buckets. Since we evaluate positive and negative sentiment later, we removed those buckets from our NRC sentiment analysis. We joined our summarized data-set by word with this dictionary in order to determine the percentage values for each sentiment within each corpus for each decade, using the number of words in each bucket as a percentage of the total number of words. We then plot the percentage values on five (one per-decade) distinct radar charts. Below is an example of code we use to calculate the 1970 radar chart.
song70_nrc <- song_tidy70 %>%
inner_join(nrc) %>%
filter(!sentiment %in% c("negative", "positive")) %>%
group_by(sentiment) %>%
summarise(Sent_Count = sum(n)) %>%
mutate(Sentiment = (Sent_Count/sum(Sent_Count))*100)
radar_chart_70s <- song70_nrc %>%
select(-Sent_Count) %>%
chartJSRadar(showToolTipLabel = TRUE, main = "1970s Sentiment")
As seen on the charts below, we notice the sentiment distributions for the 70’s, 80’s and 90’s are rather similar, underscored with high levels of joy and moderate levels of anticipation and trust. Lyrical sentiment begins to change at the turn of the century, as the sentiment levels for anger, fear, disgust and sadness all begin to spike. This is once again indicative of the phenomenon observed in previous visualizations documenting the decline of the word “love” and the rise of more profane words. Overall, songs appear to increase in sentiment complexity over time (the red area grows).
Positive and Negative Word Clouds
To further explore and verify what was leading to the rise in anger, fear, disgust and sadness in the twenty-first century, we continue our sentiment analysis to observe how positively and negatively-rated words change over the decades. Through word cloud visualization, we can see which negative words contribute most to the rise in negative sentiment in the 2000s and 2010s.
We use the ‘tidytext’ package as before, but this time decide to use the “bing” sentiment analysis (puts words into two buckets: positive or negative) method instead of NRC (puts words into ten buckets as listed above) and additionally use the ‘wordcloud2’ package to plot the resulting word clouds. As shown below, the word clouds are separated in half, with positive words on the lower half and negative words on the upper half. The first plot is for the 1970s, the second for the 2010s.
“Love” is prominently displayed in the middle of all word clouds, once again emphasizing its importance. As expected, we see a rise of profanity, aggression, and other offensive terms (which classify as negative sentiment) in the 2000s and 2010s, contributing to the previously noticed sentiments of anger, sadness, and disgust. Possible reasons for such agitation could include the dawn and duration of a war from 2001 to 2011, the bursting of the tech bubble in the early 2000s, the mortgage meltdown financial crisis of 2008, and political polarization in the 2010s.
Lexical Diversity By Year
Lexical diversity describes the use of unique words over time. “Unique” words are defined as the number of distinct words, across all songs, in a given time frame (each decade, for our case). The visualization below communicates how the complexity of song lyrics have changed over time, i.e., with more unique words comes more song lyric complexity. From our sentiment analysis radar charts above, we recall that the moods of songs has grown more complex (the red area of the radar charts has grown). Therefore, we expect to see an overall increase in lexical diversity. It would make sense that the more complex lyrics are, the more moods and sentiments they communicate. Again, we use our tokenized word list and simply count the number of words used each year by counting the number of rows per year. We then plot these results over time.
We see a slight upward trend in lexicon diversity, meaning song lyrics have grown more complex over the past fifty years, in terms of the number of unique words in a given decade. This is consistent with our sentiment analysis findings.
Lexical Density By Year
Lexical density, the ratio of unique words to total words in a given time period, pairs well with lexical diversity. Lexical density analysis allows us to observe lyrical complexity as before, but now we look at lyrical complexity in comparison to the total number of words used (across all songs) per year. Again, we find the number of unique words per year by counting the number of rows in each respective year, then divide by the total count (n) of the sum of all of those words.
The results yield a downward trend, indicating that the number of total words used in songs has increased even more drastically than the number of unique words has increased over time, implying that songs have become more verbose. Since the increase in the number of total song words outweighs the increase in the number of unique words, we believe songs have also become more repetitive over time.
Word Importance By Decade and Bigram Map
One major goal of this project is to see what lyrics make a song popular. In addition to the analyses above, we can also determine what words are most “important” to various songs in the Top 100 by answering the question: given a song in the Top 100, which words occur most in that song? To answer this question, we use a Term Frequency - Inverse Document Frequency (TF-IDF) methodology to rank words in terms of “importance.”
TF-IDF looks at the frequency of a certain term in the dataset among all songs, and the number of songs that word appears in. A word of high “importance” would have a high frequency but only occur in a limited number of songs, indicating that the word was instrumental (pun intended) to the popularity of that song. A word of low “importance” could also have a high frequency but occur in several different songs, indicating that the word is sort of a “jack of all trades, master of none.” For example, words like “girl” would have low importance, because although “girl” has high frequency, it appears in several songs, so its frequency-in-song is low. To implement this system, we use the R package ‘tidytext’ and the function ‘bind_tfidf()’, which automatically calculates word importance values.
## Warning: `show.legend` must be a logical vector.
As alluded to above, looking at singular word importance does not provide relevant information for our analysis due to the complexity of language and ambiguous contextualization of words. So, we decided to create bigrams (pairs of words) to help contextualize some of the lyrical language. Once again, we use the R package ‘tidytext’, and we tokenize the lyrics pairwise instead of singularly. In the following visualizations, we see that “love” is prominent in the first three decades considered, paired with multiple other words such as “baby” and “fall.”
Again, as we move to the turn of the century, the word importance chart becomes dominated by combinations of more profane words. To help visualize word importance, we observe a map of nodes showing how various words link to one another. Using the R package ‘ggraph’, we were able to plot bigrams of high importance. In the visualization, nodes that have many links are used in a variety of ways with multiple other words, proving their universal importance, whereas links on the outer edges of the map containing only two connections indicate bigrams that only prove to be important in a few individual cases. “Love” and “baby” have multiple important connections, but words like “free” only have one meaningful partner.
Even when using bigrams, the nuances of language are not always well-displayed. For example, one repetitive song of the 2000’s, “Crank That” by Soulja Boy, features a repetitive and self-referential chorus and verse and was featured twice in the Top 100. Therefore, because many of the phrases used in this song were unique to this song and repeated many times, it was given an extraordinarily high TF-IDF bigram importance value, earning it a spot on the top words of the decade. We attempted to remedy this issue by removing some of these problematic words but ultimately found that even after doing so, the bigrams would be replaced by other, equally unrefined words such as names of artists like “Justin” and “Timberlake,” so we remain limited as to how much information we can draw from this technique.
In an attempt to characterize the lyrical choices of what we deemed “top artists,” we isolated the 25 artists who have the most multi-year (and even multi-decade) occurrences of music across Billboards. We then aimed to examine their lexicon and see if there are any distinguishing features that could point to their success as musicians. However, despite attempts to model their top words, most important words, and most important bigrams compared to the global array of artists, we could not find any discernible differences between the corpuses used and could draw no serious conclusions for the visualizations we created (see appendix).
Latent Dirichlet Allocation and Time-Course Plots
Latent Dirichlet Allocation (LDA) is a probabilistic model used in Natural Language Processing that relies on unsupervised learning to group words into a pre-set number of clusters based on “relatedness.” The goal of this analysis is to group terms that commonly co-occur in songs into topics, which can be thoughtfully assessed more holistically with a human eye in order to draw meaning from the remote associations of words. We did the following for nouns, verbs, and adverbs.
We then created a “part of speech” placeholder that binded the three.
The algorithm first assigns words to a random topic and then it iteratively reassigns each word to a new topic and calculates the probability that a word belongs to a given topic and also the probability that the document (in our case a song) could be classified to a particular topic (Lettier 2018). In order to formulate our document-term matrix, the necessary input to an LDA model, which includes rows that are our documents (songs) and columns that are each word in a corpus, we turn to Python’s ‘gensim’ and ‘nltk’ packag, in order to tokenize and “stem” each word within each song. Stemming removes prefixes and suffixes from words that have the same base in order for them to be matched within and between songs.
From here, we are able to run the LDA model for a variety of different numbers of topics and numbers of words per topic. We eventually decide on a final number of topics: eight, with five terms per topic. Although there are ways to “optimize” the number of topics, we thought that manually evaluating the success by making sense of the words included in each topic was a better way of approaching our algorithmic selection. (We also attempted to use LDA modeling by decade but saw less success.) The below is code within our topic modeling function, which is a function that arranges topics (like passion) by words that are associated with that topic.
Ultimately the eight groupings we decided on, which are detailed below, are classified based on our associations with the words included in each topic: “Passion/Longing, Party/Dance/Movement, Intimacy/Women, Sex/Objectification, Measurement of Clout, Impulsivity/Sadness, Committment/Tether, Faith/Violence.”
For each topic, we then graph the occurrence of these themes over the course of our fifty year window. In the “Passion/Longing” grouping, we see an increase in occurrence over time, with a peak in the early 2000’s, before a steep decline in the late 2000’s and 2010’s.
For “Party/Dance/Movement,” there is an increase, followed by a plateau in the mid 1980’s, and then another large increase and peak in the late 2000s, before a final decrease in the 2010’s.
The “Intimacy/Women” topic begins high, declines in the 1980’s, and then ticks back up and peaks near the year 2000 before starting a steep descent through the 2000’s and 2010’s.
“Sex/Objectification” and “Measurements of Clout” follow a similar pattern, with a gradual increase over time with their steepest increase coming in the 2010’s.
“Impulsivity/Sadness” is bimodal in nature, with peaks in the early 1980’s and 2000’s and steep declines on either side, with a smaller decline between the peaks.
“Commitment/Tether” shows little increase over time until the late 2000’s, where it shows a sharp increase. Lastly,
“Faith/Violence” begins a gradual increase over the 1970s-1980s, with a plateau from 1990 to the late 2000’s, and a steep increase in the 2010’s.
In interpreting these results, we examine general trends in the types of songs that were in the Billboard Top 100 over time. Despite the upward trend in lexical diversity over time, we see that songs included in the Billboard Top 100 may have increasing messages of superficiality, with fewer trends dealing with emotive, love-oriented concepts. We see, instead, a shift away from songs that deal with loved-ones to a stronger focus on social-status, objectification, violence, and a relationship viewpoint that prioritizes comrades over romantic partners, noting the tethering nature of intimacy.
These results parallel those of our sentiment analysis, which find high levels of joy across decades and feature greater amounts of anger, fear and disgust in the 2010’s relative to other decades. It is apparent that utilizing LDA does offer a more nuanced way of analyzing sentiment beyond using pre-existing dictionaries of words. These observed trends, which require human intervention for their formulation, are not without bias. Additionally, the trends above may serve as poor forecasts for the overall trend in music and instead may be limited to the songs featured in the Billboard Top 100.
Work Limitations
Our findings are indubitably interesting, yet we are also excited about a few potential next steps. First, connotations of words in the English language have certainly changed over the past fifty years. For example, “sick” only started meaning “cool” around the mid-1980s as a result of West Coast skate/surf culture. Thus, the word “sick” could have both a positive (if it means “cool”) and a negative (if it means “ill”) sentiment in songs after the mid-1980s. Creating sentiment analysis data by decade and comparing to song lyrics by decade would alleviate a good portion of this issue. Our current sentiment analysis charts above show a fairly significant increase in anger, disgust, and sadness from the early 2000s to the present, mainly due to an increase in profane, negative, and aggressive words. While this increase in negative sentiment could be reflective of the general economic and political agitation of the 2000s and 2010s, it could also be the case that the use of profanity and such aggressive language is more common and socially acceptable now than it was in the late twentieth century. Therefore, breaking down sentiment by decade would provide interesting and more accurate analyses of the respective decade sentiments.
Second, we would like to use song lyrics to classify songs by genre. For example, we found an extremely interesting interactive site (https://pudding.cool/2017/09/hip-hop-words/) that outputs the “most Hip-Hop” (like “stunting”) and “least Hip-Hop” (like “desire”) words, among other things. If we could acquire data on the genre of each of the songs in our dataset, we could create these “most [insert genre]” and “least [insert genre]” lists over the last fifty years, which would be fascinating. Neither the websites we scraped nor the Genius API contained data for song genre. We tried to utilize the Itunes API, which did have genre data, but the Itunes API had embedded restrictions that would have made the process extremely inefficient and computationally challenging. Given more time, we would be interested in scraping the genre data and breaking down the above outputs over the last five decades to analyze genre and genre word popularity over time.
Lastly, it would be interesting to gather a larger dataset of songs in and out of the Billboard Top 100 to analyze the differences between charted (Top 100) and uncharted (not Top 100) songs. Opposite our analysis above, perhaps there are certain words used in songs that significantly prevent a song from being in the Top 100. We could also use rankings of the charted songs to interpret word differences between songs in the top and bottom quartiles.
Ethical Considerations
We encountered certain ethical considerations almost immediately in our research and results. As seen above, some of the most popular song lyric words over the years are profane and can certainly be deemed offensive. For example, words like the n-word and a derogatory term used to describe women have become increasingly popular over time in song lyrics, and would appear in our word cloud. As a group, we found it difficult to draw an exact line for which of these kinds of words to keep and which to exclude. Ultimately, we decided that since profanity and similarly offensive words are useful in identifying sentiment and therefore helpful to our study, we would include such words in our results, but not without a warning of such occurrences.
In addition, our group recognizes that Billboard calculates its “Top 100” songs based on the sales, radio play, and online streaming rankings of all songs (Billboard, 2018). As a result, these Top 100 popularity charts are indicative mostly of the song preferences of a portion of the population that is able to spend the time and money to purchase and listen to these songs. As a result, while we can certainly communicate interesting findings of song lyric popularity over time, we must be careful when generalizing to the population, since there exist less privileged groups whose preferences may not be reflected in the Top 100.
Conclusion
Overall, popular song lyrics over the past fifty years are able to tell so much beyond just what makes a song popular. We were able to make conclusions and advanced analyses about popular words, significant sentiments, song complexity, and related concepts and topics over time. Our group really enjoyed working together on this project and remains ambitious to explore our next steps!
APPENDIX
Data Acquisition and Cleaning
Word Importance By Timeless Artists
Timeless artists, according to the data, included performers such as Taylor Swift, Michael Jackson, Elton John, Cher, Mariah Carey, etc.
## Warning: `show.legend` must be a logical vector.